numeric expression
Handling Numeric Expressions in Automatic Speech Recognition
Huber, Christian, Waibel, Alexander
This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp). We compare cascaded and end-to-end approaches to recognize and format numeric expression, such as years, timestamps, currency amounts, and quantities. For the end-to-end approach we employed a data generation strategy using a large language model (LLM) together with a text to speech (TTS) model to generate adaptation data. The results on our test dataset show that while approaches based on LLMs perform well on recognizing formatted numeric expressions, adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.
FiNER: Financial Numeric Entity Recognition for XBRL Tagging
Loukas, Lefteris, Fergadiotis, Manos, Chalkidis, Ilias, Spyropoulou, Eirini, Malakasiotis, Prodromos, Androutsopoulos, Ion, Paliouras, Georgios
Publicly traded companies are required to submit periodic reports with eXtensive Business Reporting Language (XBRL) word-level tags. Manually tagging the reports is tedious and costly. We, therefore, introduce XBRL tagging as a new entity extraction task for the financial domain and release FiNER-139, a dataset of 1.1M sentences with gold XBRL tags. Unlike typical entity extraction datasets, FiNER-139 uses a much larger label set of 139 entity types. Most annotated tokens are numeric, with the correct tag per token depending mostly on context, rather than the token itself. We show that subword fragmentation of numeric expressions harms BERT's performance, allowing word-level BILSTMs to perform better. To improve BERT's performance, we propose two simple and effective solutions that replace numeric expressions with pseudo-tokens reflecting original token shapes and numeric magnitudes. We also experiment with FIN-BERT, an existing BERT model for the financial domain, and release our own BERT (SEC-BERT), pre-trained on financial filings, which performs best. Through data and error analysis, we finally identify possible limitations to inspire future work on XBRL tagging.
PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains
In recent years research in the planning community has moved increasingly toward s application of planners to realistic problems involving both time and many typ es of resources. For example, interest in planning demonstrated by the space res earch community has inspired work in observation scheduling, planetary rover ex ploration and spacecraft control domains. Other temporal and resource-intensive domains including logistics planning, plant control and manufacturing have also helped to focus the community on the modelling and reasoning issues that must be confronted to make planning technology meet the challenges of application. The International Planning Competitions have acted as an important motivating fo rce behind the progress that has been made in planning since 1998. The third com petition (held in 2002) set the planning community the challenge of handling tim e and numeric resources. This necessitated the development of a modelling langua ge capable of expressing temporal and numeric properties of planning domains. In this paper we describe the language, PDDL2.1, that was used in the competition. We describe the syntax of the language, its formal semantics and the validation of concurrent plans. We observe that PDDL2.1 has considerable modelling power --- exceeding the capabilities of current planning technology --- and presents a number of important challenges to the research community.
PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains
In recent years research in the planning community has moved increasingly toward s application of planners to realistic problems involving both time and many typ es of resources. For example, interest in planning demonstrated by the space res earch community has inspired work in observation scheduling, planetary rover ex ploration and spacecraft control domains. Other temporal and resource-intensive domains including logistics planning, plant control and manufacturing have also helped to focus the community on the modelling and reasoning issues that must be confronted to make planning technology meet the challenges of application. The International Planning Competitions have acted as an important motivating fo rce behind the progress that has been made in planning since 1998. The third com petition (held in 2002) set the planning community the challenge of handling tim e and numeric resources. This necessitated the development of a modelling langua ge capable of expressing temporal and numeric properties of planning domains. In this paper we describe the language, PDDL2.1, that was used in the competition. We describe the syntax of the language, its formal semantics and the validation of concurrent plans. We observe that PDDL2.1 has considerable modelling power --- exceeding the capabilities of current planning technology --- and presents a number of important challenges to the research community.